Representation of Target Classes for Text Classification - AMRITA_CEN_NLP@RusProfiling PAN 2017
نویسندگان
چکیده
This working note describes the system we used while participating in RusProfiling PAN 2017 shared task. The objective of the task is to identify the gender trait of the author from the author’s text written in the Russian Language. Taking this as a binary text classification problem, we have experimented to develop a representation scheme for target classes (called class vectors) from the texts belonging to the corresponding target classes. These class vectors are computed from the traditional representation methods available in Vector Space Models and Vector Space Models of Semantics. Followed by the representation, Support Vector Machine with a linear kernel is used to perform the final classification. For this task, genre independent corpus is provided by the RusProfiling PAN 2017 shared task organizers. This proposed model attains almost equal performance across all the genre available in the test corpus.
منابع مشابه
A Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملOverview of the RUSProfiling PAN at FIRE Track on Cross-genre Gender Identification in Russian
Author profiling consists of predicting some author’s traits (e.g. age, gender, personality) from her writing. After addressing at PAN@CLEF mainly age and gender identification, in this RusProfiling PAN@FIRE track we have addressed the problem of predicting author’s gender in Russian from a cross-genre perspective: given a training set on Twitter, the systems have been evaluated on five differe...
متن کاملA New Document Embedding Method for News Classification
Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...
متن کاملNLP CEN AMRITA @ SMM4H: Health Care Text Classification through Class Embeddings
Artificial Intelligence has been a major breakthrough in many domains. Now, it has started automating health care domain through Natural Language Processing and Computer Vision applications. As a part of it, researchers are now focusing more on mining health related information from the text shared through social media and clinical trials. This paper explains about our system for health care te...
متن کاملDistributed Representation in Information Retrieval - AMRITA_CEN_NLP@IRLeD 2017
In this contemporary research era, the science of retrieving required information from the stored database is extending its applications in the legal and life science domains. With the exponential growth of the digital data available in the legal domain as an electronic media, there is a great demand for efficient and effective ways to retrieve required information from the stored document coll...
متن کامل